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Abstract. We mark up a corpus of WTfj^ lecture notes semantically and 
expose them as Linked Data in XHTML+MathML+RDFa. Our applica- 
tion makes the resulting documents interactively browsable for students. 
Our ontology helps to answer queries from students and lecturers, and 
paves the path towards an integration of our corpus with external sites. 

1 Application: Computer Science Lecture Notes 

Over the last seven years, the second author has accumulated a large corpus 
of teaching materials, comprising more than 2,000 slides, about 1,000 home- 
work problems, and hundreds of pages of course notes, all written in I^T^t;X. The 
material covers a general first-year introduction to computer science, graduate 
lectures on logics, and research talks on mathematical knowledge management. 
This situation is typical for educators and researchers and represents the state 
of the art in mathematics, physics, computer science, and engineering: I^TfTpC 
has proven suitable for writing high-quality lecture notes and publishing them 
as PDF. However, in our educational setting, we would like to benefit from the 
much larger degree of interactivity that screen reading and e-books support. For 
example, while reading notes students want to directly look up the meaning of a 
symbol (e. g. N) in a formula, or examples for a difficult concept (e. g. structural 
induction). They may want to select advanced material for self-study from the 
whole body of lecture notes, based on the topics covered in the lecture. They 
want to use a search engine to find related material in other universities' online 
course notes, on mathematical web sites, or Wikipedia. Lecturers want to query 
their repository for document parts reusable in an upcoming lecture, given the 
prerequisites students are expected to meet and the material that has already 
been covered. In a course for a special audience, e. g. mathematics for physicists, 
they want to draw examples from that domain even though they are less familiar 
with it. They also want to locate didactic gaps, such as concepts without ex- 
amples, or unjustified proof steps. These services require semantic annotations 
in the lecture notes that are understandable for external search engines. Plain 
WT^^ is barely usable for anything beyond on-screen reading and printing. Even 
simple semantic annotations are uncommon, rare exceptions are the \title 
command making its meaning explicit or \frac{a}{b} focusing on functional 
structure instead of visual layout. This is especially problematic for symbols in 
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formulae, which are often overloaded with multiple definitions or presentable in 
different notations. (^) can be a vector or a binomial coefficient, and a French or 
Russian would write the latter as C^. Therefore, we have developed a semantic 
representation of mathematical knowledge in I^T[TpC and a presentation process 
that preserves these semantic structures as Linked Data in the output, exposing 
them to mashups for interactive exploration, as well as semantic searching and 
querying. These are based on an ontology for mathematical knowledge so that 
mathematical content can be linked across different repositories. 

2 Research Background and Related Work 

I^T^^X's importance in scientific authoring and its extensibility by macros have 
led to semantic extensions enabling modern publishing workflows. SALT (se- 
mantically annotated I^T[t;X [8]) marks up rhetorical structures and fine-grained 
citations in scientific documents. Its markup is not sufficiently fine-grained for 
formulae, and its vocabulary is limited to rhetorics and citations and not exten- 
sible. Our own sTJt;X offers macros for introducing new mathematical symbols 
and using arbitrary metadata vocabularies. Some math e-learning systems, such 
as ActiveMath [1] or MathDox [17], use semantic representations of formulae and 
higher-level structures, e. g. proof steps or course module dependencies, in the 
standard XML languages OpenMath [5] and OMDoc [11]. They utilize seman- 
tic structures but do not publish them in a standard representation like RDF, 
which would promote general-purpose queries beyond the built-in services and 
integration with other systems on the web. The Linking Open Data movement 
promotes best practices for publishing data on the web [9], as standalone RDF 
or embedded into HTML documents as RDFa [2]. Applications include Sindice, 
an engine that crawls and indexes Linked Data [19], and the Sparks O3 Browser, 
a mashup that utilizes RDFa annotations in HTML for interactive browsing [20]. 
Our interactive documents work similarly but additionally support annotations 
in MathML formula. MathML has pioneered embedded annotations long before 
RDFa, albeit with a more limited scope. Its parallel markup interlinks the ren- 
dered appearance and the semantic structure of mathematical expressions; the 
meaning of mathematical symbols is usually defined in lightweight ontologies 
called OpenMath content dictionaries [4]. HELM (Hypertext Electronic Library 
of Mathematics [3]) pioneered representing structures of mathematical knowl- 
edge in RDF, e. g. what mathematical theory introduces a symbol, what of its 
properties have been declared or asserted, and how the latter are proved. The 
HELM ontology has not gained wide acceptance, though. At the time of its de- 
velopment, there was no RDFa-like way of embedding RDF into web documents. 

3 Architecture and Demo 

Our architecture publishes semantically enriched MJ^X lecture notes as XHTML-I- 
MathML-|-RDFa Linked Data. We kept LTJt;X as an input language, as it is fa- 
miliar to authors and well supported by editors, and as high-quality PDF can 
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be obtained from it. With (semantically enhanced T[t;X), we have intro- 

duced I^T[5X macros for marking up the semantic structure of formulae and doc- 
uments [12] and manually annotated our complete corpus using the sT[TpC plugin 
for the Emacs editor. One can, e. g., declare a symbol union, formally define it, 
and make its semantic representation \union{A , B , C} expand to A\cup B\cup C 
for human-readable rendering. There are environments for mathematical state- 
ments and theories, e.g. \begin{excmiple} [f or=union] . I^T^t;XML transforms 
this into a semantically equivalent intermediate XML representation, using the 
standard XML languages OpenMath for formulae [5] and OMDoc for higher-level 
structures [11]. Finally, our JOMDoc rendering library [10] generates human- 
readable output from this XML - an output that still contains the full semantic 
structure as annotations. A custom Java implementation renders formula as 
parallel markup of Presentation MathML annotated with OpenMath''; render- 
ing higher-level structures as XHTML-f RDFa [2] is implemented in XSLT. RDF 
is extracted from XML by our Krextor XML— > RDF library [15], which generates 
URIs for all mathematical objects in a document. It uses our OMDoc ontology 
(cf. [14]) as a vocabulary for representing all mathematical structures (e.g. "c? 
is a definition, e is an example for d") plus full text, inspired by HELM and 
designed as a more expressive counterpart of the OMDoc XML schema. 

The whole transformation process is integrated into our versioned XML 
database TNTBase [22]; see http://kwarc.info/LinkedLectures. TNTBase 
has a Subversion-compatible interface making it suitable as a lecture notes repos- 
itory. The T^^X— >-XML and XML— RDF transformations are automatically trig- 
gered by a hook upon committing a new revision of an sTJt;X lecture module. If 
the generated OMDoc-|-OpenMath is not schema-valid, the commit is rejected. 
On the other hand, it follows Linked Data best practices and, depending on 
the MIME type an HTTP client requests, serves a document as OMDoc, as 
RDF (only a structural outline, not the full text and formula), or as XHTML-I- 
MathML-|-RDFa. The latter contains JavaScript code from our JOBAD library 
for interactive documents [13,7], which operationalizes the annotations - Linked 
Data and other - in the rendered documents. JOBAD's definition lookup deter- 
mines the OpenMath annotation of the Presentation MathML symbol the user 
clicked on, from that obtains the URI of the symbol, and then requests XHTML 
from that URI (resulting in the symbol's declaration and definition), which is 
then displayed in a popup. The RDFa annotations are used for making parts of a 
document (e. g. steps of a structured proof) foldable, and for displaying the local 
neighborhood in the RDF graph (e. g. related examples) in popups; this is im- 
plemented using the rdfQuery library [18], relying on the Linked Data structure 
in the latter case. Further third-party services can be integrated in a mashup 
style; we have demonstrated this for a unit conversion service [13,7]. Besides en- 
abling JOBAD's services, we have implemented machinery to load the extracted 



^ A proposal for fully representing formulas in RDF [16] has not gained wide accep- 
tance. RDF-based reasoners are often limited to decidable first order logic subsets, 
which is insufficient for mathematical applications, and XML has a straightforward 
notion of order (e.g. of the arguments of an operator or of a set constructor). 
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RDF into a triple store and query it using SPARQL. We also provide a widget 
for formulating queries without knowing SPARQL and the OMDoc ontology. It 
allows to ask some non-trivial queries, e. g. "find examples for all concepts from 
graph theory (about which Fm planning a lecture), assuming as prerequisites 
the concepts from formal languages (and their prerequisites)". This would yield 
the parse tree of a context-free language as an example for the concept "tree" - 
as operating systems were not among the prerequisites. 
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Our demo shows the complete pipehne in action: (i) annotating a document 
with our sTJt;X Emacs mode, (ii) committing it to TNTBase, (iii) automatic 
translation to OMDoc, schema validation, and RDF extraction, (iv) loading the 
extracted RDF data into a triple store, (v) retrieving the document in different 
representations, (vi) browsing the XHTML+MathML+RDFa rendering, (vii) in- 
teracting with the Linked Data in it, (viii) and querying a triple store. Addition- 
ally, we will demonstrate the generation of PDF from the sources. 



4 Conclusion and Outlook 

Our architecture makes legacy I^TJt;X lecture notes available as Linked Data. 
We expose these data to external clients but have also implemented services for 
interactively exploring the XHTML-|-MathML-|-RDFa presentation of our data. 
We are also working on preserving some of the semantics in the PDF output, as 
SALT does. Evaluation of our enriched lecture notes by the student end users 
is planned for the next semester. To the best of our knowledge, we are the first 
provider of RDF-based Linked Data in the domain of mathematics and among 
the first to opcrationalize the Linked Data structures of formula markup. Having 
successfully transformed more than 300,000 normal, non-semantic L^TJ^X docu- 
ments from arxiv.org to XHTML-|-Presentation MathML [21] and working on 
machinery for automatically annotating them using natural language process- 
ing, we will soon be able to expose even more mathematical knowledge as Linked 
Open Data; however, due to the inherent complexity of mathematical knowledge, 
with a less formal semantics than manually annotated documents. Our lecture 
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notes are self-contained so far, but we are now starting to reap the benefits 
of Linked Data by linking them to other data sets, e.g. DBpedia [6], whose 
mathematical knowledge does not have a semantics as strong as ours, but which 
provides abundant informal background knowledge, e. g. about the originators 
of mathematical theories. On the other hand, hardly any well-known mathemat- 
ical site (e.g. planetmath.org and mathworld.wolfrEim.com) currently exposes 
machine-understandable metadata. We promote our technology, starting with 
lightweight RDFa annotation using the OMDoc ontology, as a migration path 
towards their integration into a true mathematical Semantic Web. 
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